Statistical modelling in R for ecologists (Part 2)

Made for

Bush Heritage

Date

March 28, 2024

Any questions about these course notes, contact the author here

Here we build upon modelling skills developed in Part 1 of this course. In Part 1, we learned how to fit models to data using Frequentist statistics.

Here we will fit the same models in a Bayesian paradigm. Then we will build more complex, multivariate generalised linear models to demonstrate when they might be preferred over Frequentist approaches.

We’ll also compare multivariate generalised linear models (i.e., model-based ordination) to other algorithmic approaches typically used for analysing ecological community data (e.g., nMDS).

Finally, we’ll show how these techniques can be used to model joint species distributions, and how R can handle spatial data.

Download the models you’ll need for today from here

Frequentist vs. Bayesian

Find more poorly drawn lines comics here

Diving into the fundamental and nuanced differences between Frequentist and Bayesian statistics is beyond the scope of this short course. But you can find some nice explanations of the key differences here and here.

From my point of view, both are useful. Often, I start by fitting Frequentist models to my data and then move to more complex Bayesian models if needed.

Why?

  • Bayesian models require more ‘number crunching’ to estimate parameters (coefficients), and therefore can take a long time to fit compared to their Frequentist analogues.
  • Bayesian models require us to define ‘priors’ for our model parameters. If we don’t have much prior information on model parameters (i.e., our priors are ‘uninformative’, which is usually the default), the coefficients estimated by Bayesian models should be the same as their Frequentist analogues.

Sometimes though, we need to go Bayesian. For example, if we want to model spatio-temporal autocorrelation in our data, we might like to use Bayesian models in the R-INLA package.

Another advantage of Bayesian models is that our parameters are treated as random variables rather than fixed. In practical terms this means that, rather than a single point estimate for model coefficients (+/- confidence intervals), we get an entire probability distribution of plausible values for each coefficient (referred to as the posterior distribution). This provides us with a richer understanding our coefficients and therefore of the ecological relationships we are trying to estimate.

Here, we’ll go Bayesian because our ultimate goal is to fit a complex multivariate generalised linear model and account for spatial autocorrelation in our observations by including a spatial random effect.

Before we do though, we need to learn a little bit more about how Bayesian models estimate parameters.

Bayesian vs. Frequentist parameter estimation

In Part 1 of this course we touched on how model parameters (coefficients) are estimated with either ordinary least squares (general linear models) or maximum likelihood (generalised linear models).

The Bayesian models we’ll use today use Markov Chain Monte Carlo (MCMC) sampling to estimate the posterior distribution for each parameter (coefficient).

Here is a brief overview of the key differences between Frequentist and Bayesian parameter estimation.

We don’t have time to discuss the details today, but the key things you need to know when fitting a Bayesian model with MCMC sampling are:

  1. MCMC sampling is an algorithm that results in a chain of samples that form a probability distribution which, in our case, is the posterior distribution for each of our model parameters. Read more here

  2. We have to set the following MCMC sampling parameters:

  1. the number of chains (typically 4),
  2. how many samples of the posterior to obtain in each chain (typically 1000),
  3. the thinning rate (how often do we keep samples in each chain, e.g., a thinning rate of 1 means we keep every sample, a rate of 5 means we keep every 5th sample),
  4. the length of the transient (also called ‘burn-in’) - this is how many initial samples we discard from each chain.
  1. After we set the MCMC sampling parameters and fit the model, we need to check that the MCMC sampling has converged. This means that, in addition to checking traditional structural model assumptions for linear models like we learned in Part 1 (i.e., normality and homoskedasticity of residuals), we also need to check MCMC convergence diagnostics before making any inference or predictions from our model.

Okay, now we’re ready to try fitting a Bayesian model.

General(ised) linear models in a Bayesian framework

In Part 1, we fitted general and generalised linear models in a Frequentist framework. (Remember general linear models assume a normal error structure (think tree circumference as our response variable (\(y\))), whereas generalised linear models can accommodate non-normal error structures (think species counts or presence-absence as the response \(y\))).

Let’s try fitting the same models in a Bayesian framework using MCMC sampling for parameter (coefficient) estimation. We’ll use weak, uninformative priors (the default), meaning our coefficient estimates from the Bayesian model should be the same as the Frequentist.

In the interest of time, we’ll just repeat the generalised linear model of species abundance in relation to fire severity that we did in Part 1. But if you’re keen, feel free to try the general linear model of tree circumference (you’ll just need to specify a normal (gaussian) distribution instead of a poisson).

First, let’s simulate the data we need for species abundance and fire severity (repeating exactly as we did in Part 1).

library(ggplot2)
set.seed(123) # set seed for random number generator to ensure results are reproducible

b0 <- log(10) # average species abundance when fire severity is = 0 (i.e, intercept) is 10
b1 <- -log(1/0.5) # species abundance decreases by a factor of 2 (aka 50% decrease in abundance) for a one unit increase in fire severity
n <- 100 # number of observations
X <- runif(n, min = 0, max = 1) # fire severity measurements
lambda <- exp(b0 + b1*X) # estimate mean species abundance (lambda) on multiplicative scale using the exponent
y <- rpois(n, lambda) # simulate species abundance observation from each lambda
df <- data.frame(Fire_severity = X, Species_abundance = y) # make a dataframe of observations

ggplot(df) +
  aes(x = Fire_severity, y = Species_abundance) +
  geom_point() +
  ylab('Species abundance') +
  xlab('Fire severity') +
  geom_smooth(method = 'lm') +
  #geom_smooth(method = 'loess') +
  theme_classic()
`geom_smooth()` using formula = 'y ~ x'

Fitting a Bayesian GLM and checking MCMC convergence

Now we can fit the model. We’ll use {Hmsc} to do this - a flexible Bayesian modelling framework for fitting complex hierarchical models with spatial or non-spatial random effects. ‘Hmsc’ is an acronym for ‘Hierarchical modelling of species communities’. We can use it to fit univariate models (as we are right now) and multivariate models (as we’ll do later).

Let’s load the package and fit a Bayesian generalised linear model where our response \(y\) is species abundance and our explanatory variable \(X\) is fire severity.

library(Hmsc)
Loading required package: coda
Warning in .recacheSubclasses(def@className, def, env): undefined subclass
"ndiMatrix" of class "replValueSp"; definition not updated
# set up the model
m <- Hmsc(Y = as.matrix(df$Species_abundance),
          XData = df,
          XFormula = ~ Fire_severity,
          distr = 'poisson')

# fit the model with MCMC sampling
m_fit <- sampleMcmc(m, nChains = 4, samples = 1000, thin = 1, transient = 2500, verbose = F)
setting updater$GammaEta=FALSE due to absence of random effects included to the model
Computing chain 1
Computing chain 2
Computing chain 3
Computing chain 4

So, as you can see, takes a bit longer than our frequentist glms. Now we want to check whether the MCMC sampling has converged.

First, we’ll extract the posterior distributions for the coefficients estimated by our model (i.e., the y-intercept and the beta coefficients).

mpost <- convertToCodaObject(m_fit)

Now we can look at trace plots of the MCMC sampling for each of the posterior distributions to see whether they have converged. We expect to see random mixing of the four MCMC chains (a fuzzy caterpillar), and bell-shaped, unimodal posterior distributions.

plot(mpost$Beta)

Not ideal. The chains are not mixing very well, and the posterior distributions are not bell-shaped (are a bit too pointy, more like a witch’s hat).

We can also get some quantitative convergence diagnostics, such as the effective sample size of the MCMC chains, to help us diagnose the issue. We would expect the effective samples size to be close to 4000 (1000 samples/chain * 4 chains). If it is less, that indicates there is auto-correlation in the MCMC sampling

effectiveSize(mpost$Beta)
  B[(Intercept) (C1), sp1 (S1)] B[Fire_severity (C2), sp1 (S1)] 
                       107.1974                        114.8919 

Whoa, much less - we have an issue with autocorrelation in the sampling.

We can also check the Gelman-Rubin potential scale reduction factors (which should be close to 1, indicating the individual chains give similar results).

gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
                                Point est. Upper C.I.
B[(Intercept) (C1), sp1 (S1)]     1.009974   1.029671
B[Fire_severity (C2), sp1 (S1)]   1.013335   1.034058

Higher than 1. This confirms what we saw above in the trace plots - the chains do not mix well.

Let’s try increasing the thinning rate to improve convergence of the MCMC sampling. This will mean the model will take a lot longer to fit - it’s doing a a lot more sampling because it can only keep every 100th sample to get a total of 1000 samples for each chain.

So I’m going to ask it to run the MCMC chains in ‘parallel’ to speed it up using the argument nParallel. (Note, in the remainder of the notes the code for fitting models is commented out with a # because it takes a while to fit them (try running them on your own time). So instead we provide the fitted models as RDS files that can be read in with the readRDS function - see below).

# fit the model with MCMC sampling
#m_fit <- sampleMcmc(m, nChains = 4, samples = 1000, thin = 10, transient = 2500, verbose = F, nParallel = 4)
#saveRDS(m_fit, 'model_10thinning.rds')
m_fit <- readRDS('model_10thinning.rds')
# extract posterior distributions for beta coefficients
mpost <- convertToCodaObject(m_fit)
plot(mpost$Beta)

Looks better, what about our quantitative diagnostics?

effectiveSize(mpost$Beta)
  B[(Intercept) (C1), sp1 (S1)] B[Fire_severity (C2), sp1 (S1)] 
                       924.7233                        753.0863 
gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
                                Point est. Upper C.I.
B[(Intercept) (C1), sp1 (S1)]     1.004772   1.016165
B[Fire_severity (C2), sp1 (S1)]   1.003701   1.012686

Improved - the psrf indicate the chains are providing similar results (as we can see in the trace), but the effective sample size is still low. Let’s bump up the thinning a bit more.

# fit the model with MCMC sampling
#m_fit <- sampleMcmc(m, nChains = 4, samples = 1000, thin = 100, transient = 2500, verbose = F, nParallel = 4)
#saveRDS(m_fit, 'model_100thinning.rds')
m_fit <- readRDS('model_100thinning.rds')
# extract posterior distributions for beta coefficients
mpost <- convertToCodaObject(m_fit)
plot(mpost$Beta)

Excellent, even better than before. How about the effective sample size?

effectiveSize(mpost$Beta)
  B[(Intercept) (C1), sp1 (S1)] B[Fire_severity (C2), sp1 (S1)] 
                       4000.000                        3899.963 
gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
                                Point est. Upper C.I.
B[(Intercept) (C1), sp1 (S1)]    1.0003794   1.003090
B[Fire_severity (C2), sp1 (S1)]  0.9997679   1.001205

Much better. We’re happy now that our MCMC sampling is converged, meaning that, as long as our structural model assumptions are met, we can make inference from this model. Note, however, that sometimes you might notice that earlier samples in your trace plots look different from later samples. In this case you can try increasing the transient (i.e. the initial samples that are discarded).

Checking structural model assumptions

Now we want to check the same model assumptions we have when fitting Frequentist GLMs: the model residuals are normally distributed and have homogeneity of variance.

Unfortunately our handy plot function we used in Part 1 won’t work for {Hmsc} model objects. So we have to calculate the model residuals ourselves and plot them.

preds <- computePredictedValues(m_fit) # array of fitted (predicted) values for species abundance for each of the 4000 MCMC samples
Fitted.mean <- apply(preds, FUN = mean, MARGIN = 1) # get the mean predicted value species abundance (n = 4000)
Residuals <- scale(m_fit$Y - Fitted.mean) # calculate standarised residuals
par(mfrow = c(1,2)) # set graphical parameters
plot(Fitted.mean, Residuals); abline(a = 0, b = 0) # plot the fitted vs. residuals
qqnorm(Residuals); qqline(Residuals) # qq plot

Since we are considering count data where high dispersion can mean the errors are not normally distributed, we can also try randomised quantile residuals, which we would expect to be normally distributed even if the data is highly dispersed (but not overdispersed).

a <- ppois(m_fit$Y-1, Fitted.mean)
b <- ppois(m_fit$Y, Fitted.mean)
qresids <- qnorm(runif(n = length(Fitted.mean), min = a, max = b))
par(mfrow = c(1,2)) # set graphical parameters
plot(Fitted.mean, qresids); abline(a = 0, b = 0) # plot the fitted vs. residuals
qqnorm(qresids); qqline(qresids) # qq plot

Both look about the same (so high dispersion in the count data isn’t an issue for interpreting the residuals). And looks very similar to the diagnostic plots we made in Part 1, where we fitted a Frequentist GLM to the same data.

Now let’s check the coefficient estimates are close to our known truths.

summary(mpost$Beta)

Iterations = 2600:102500
Thinning interval = 100 
Number of chains = 4 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                                   Mean      SD Naive SE Time-series SE
B[(Intercept) (C1), sp1 (S1)]    2.3526 0.07106 0.001124       0.001123
B[Fire_severity (C2), sp1 (S1)] -0.7703 0.13709 0.002168       0.002197

2. Quantiles for each variable:

                                  2.5%    25%     50%    75%  97.5%
B[(Intercept) (C1), sp1 (S1)]    2.213  2.305  2.3532  2.399  2.491
B[Fire_severity (C2), sp1 (S1)] -1.038 -0.862 -0.7712 -0.676 -0.506

Remember, these predictions are in log-space, so we need to back-transform them to the natural scale (the inverse of the natural log is the exponent). But, as you can see, they are almost exactly the same as our Frequentist model from Part 1!

exp(summary(mpost$Beta)$statistics[1]) # back transform the y-intercept (mean species abundance when fire severity = 0)
[1] 10.51245
exp(summary(mpost$Beta)$statistics[2]) # back transform the slope
[1] 0.4628578

Multivariate Generalised linear model

Okay, so now we know how to fit a Bayesian GLM, and we can see that with default, uninformative priors it produces the same coefficient estimates as it’s Frequentist analogue. It took longer to fit though, and there were more diagnostics to check (MCMC convergence in addition to standard structural model assumptions). So why go Bayesian?

Well, in the previous example, we might opt to stick with a Frequentist GLM in the interest of saving time. But, what if we had observations of multiple species’ abundances that we wanted to model together? We can build upon our Bayesian model to do so, and include a site-level random effect to estimate residual correlations in species’ abundances (i.e., the correlations in species’ abundances left unexplained, after accounting variability explained by explanatory variables (e.g., Fire severity)).

You might be thinking, this sounds like a constrained ordination. You would be right! Fitting multivariate GLMs is analagous to fitting a constrained ordination. The difference is, the former is a model-based approach to ordination (which means we can do things like account for lack of independence in our samples due to spatial autocorrelation (we’ll do that later)), and the latter is an algorithmic approach to ordination.

For interest, we’ll fit a algorithmic unconstrained ordination (nMDS) to a multivariate dataset of species’ abundance and compare this to a multivariate GLM.

Let’s start by simulating a new dataset of species abundance data for multiple species. We’ll also make this a spatial example with longitude (x) and latitude (y) coordinates for each location where species are surveyed.

library(MASS)
set.seed(123) # set seed for random number generator to ensure results are reproducible
n <- 100 # number of observations
ns <- 5 # number of species

# simulate relationships between species abundance and fire severity
b0 <- log(rep(10, ns)) # y-intercept for each species is 10
b1 <- log(c(2,1.1,1,1.1,2))*c(-1,-1,0,1,1) # slopes for each species (spp 1&2 decrease by factor of 2 & 1.1, spp3 no change, spp 4&5 increase by factor of 1.1 &2)
beta <- cbind(b0,b1) # bind beta coefficients together
X <- cbind(rep(1,n),runif(n, min = 0, max = 1)) # fire severity measurements

# simulate spatial autocorrelation in the abundances
xycoords <- data.frame(x = runif(n, 152.1,152.5), y = runif(n, 28.1, 28.5)*-1) # make spatial coordinates for survey locations
sigma.spatial <- c(2) # standard deviation of spatial covariance
alpha.spatial <- c(0.35) # spatial scale parameter 'alpha' for spatial covariance
sigma <- sigma.spatial^2*exp(-as.matrix(dist(xycoords))/alpha.spatial) # exponentially decreasing spatial covariance matrix
eta <- mvrnorm(mu=rep(0,n), Sigma=sigma) # draw spatially structured residuals from a multivariate normal distribution (spatially structured latent variable)
beta.spatial <- c(1,2,-2,-1,0) # define how species are spatially auto-correlated (i.e., respond to spatially structured latent variable (eta))
spatial_res <- eta%*%t(beta.spatial) # calculate spatially structured residuals for each species
lambda <- exp((X%*%t(beta))) + spatial_res # add spatial residuals to mean expected abundance of each species due to fire severity

# make final data frame of simulated species abundances with spatial autocorrelation
y <- data.frame(apply(data.frame(lambda), 2, function(x) rpois(n, x))) # simulate species abundance observations from the poisson distribution
colnames(y) <- c('spp1', 'spp2', 'spp3', 'spp4', 'spp5') # add column names to data frame
df <- data.frame(xycoords, y, Fire_severity = X[,2], Spatial_latent_variable = eta) # add fire severity and spatial latent variable
head(df)
         x         y spp1 spp2 spp3 spp4 spp5 Fire_severity
1 152.3400 -28.19549    6    6   15   14   11     0.2875775
2 152.2331 -28.48494    3    7   14   19   18     0.7883051
3 152.2954 -28.34055    4    4   12   10   10     0.4089769
4 152.4818 -28.30601    2    6   11    7   22     0.8830174
5 152.2932 -28.26103    4    6   10    9   17     0.9404673
6 152.4561 -28.45210    7   12   12   10   11     0.0455565
  Spatial_latent_variable
1              -0.7804898
2              -1.8583634
3              -1.5742188
4              -1.0602457
5              -0.2548178
6              -0.1344579

What do the simulated relationship’s look like? Let’s do some plots. First, it will be easiest to plot if we make this data tidy. This means that, instead of having a separate column for each species’ abundance observations (wide format data) we pivot to long format, whereby we have only one column for abundance observations (so every unique observation of abundance has its own row).

Let’s try it.

library(tidyr)
df_long <- pivot_longer(df, cols = spp1:spp5, 
               names_to = 'spp', 
               values_to = 'abundance')
head(df_long)
# A tibble: 6 × 6
      x     y Fire_severity Spatial_latent_variable spp   abundance
  <dbl> <dbl>         <dbl>                   <dbl> <chr>     <int>
1  152. -28.2         0.288                  -0.780 spp1          6
2  152. -28.2         0.288                  -0.780 spp2          6
3  152. -28.2         0.288                  -0.780 spp3         15
4  152. -28.2         0.288                  -0.780 spp4         14
5  152. -28.2         0.288                  -0.780 spp5         11
6  152. -28.5         0.788                  -1.86  spp1          3

Great, now we can more easily plot a trend line for each species’ abundance in response to fire severity. We also know how spatial autocorrelation will influence species’ abundances, so we can also plot trends in species’ abundances in response to the the spatial latent ‘unobserved’ variable we simulated above called eta (\(\eta\)).

library(patchwork)

Attaching package: 'patchwork'
The following object is masked from 'package:MASS':

    area
a <- ggplot(df_long) +
  aes(x = Fire_severity, y = abundance, col = spp) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = 'lm') +
  ggtitle('A) Fire Severity') +
  ylab('Species abundance') +
  xlab('Fire Severity') +
  theme_classic()
b <- ggplot(df_long) +
  aes(x = Spatial_latent_variable, y = abundance, col = spp) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = 'lm') +
  ggtitle('B) Spatial LV') +
  ylab('Species abundance') +
  xlab('Spatial Latent Variable') +
  geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.5) +
  theme_classic()
a+b
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

What trends did we expect to see? We can take the exponent of the slope (\(b{_1}\) coefficients) to get the expected multiplicative change in species abundance for every one unit increase in fire severity. If we multiply this by the y-intercept (mean abundance at fire severity = 0), that will give us the abundance we expect when fire severity = 1.

data.frame(Species = colnames(y),
           Known_truths_beta1 = exp(b1),
           Expected_abundance_FS_1 = exp(b0)*exp(b1))
  Species Known_truths_beta1 Expected_abundance_FS_1
1    spp1          0.5000000                5.000000
2    spp2          0.9090909                9.090909
3    spp3          1.0000000               10.000000
4    spp4          1.1000000               11.000000
5    spp5          2.0000000               20.000000

Pretty close for most species, although some deviations from expected (especially for species 3 - we would have expected a flat line around 10). The deviations will stem from the spatial autocorrelation that we simulated in the data. But don’t worry about it for now, we’ll discuss that in more detail later.

Spatial data in R

Let’s turn our species abundance dataframe into a spatial one and map it. This is easy in R.

library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
df.sf <- st_as_sf(df, coords = c('x', 'y'), crs = 4326)
head(df.sf)
Simple feature collection with 6 features and 7 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 152.2331 ymin: -28.48494 xmax: 152.4818 ymax: -28.19549
Geodetic CRS:  WGS 84
  spp1 spp2 spp3 spp4 spp5 Fire_severity Spatial_latent_variable
1    6    6   15   14   11     0.2875775              -0.7804898
2    3    7   14   19   18     0.7883051              -1.8583634
3    4    4   12   10   10     0.4089769              -1.5742188
4    2    6   11    7   22     0.8830174              -1.0602457
5    4    6   10    9   17     0.9404673              -0.2548178
6    7   12   12   10   11     0.0455565              -0.1344579
                    geometry
1   POINT (152.34 -28.19549)
2 POINT (152.2331 -28.48494)
3 POINT (152.2954 -28.34055)
4 POINT (152.4818 -28.30601)
5 POINT (152.2932 -28.26103)
6  POINT (152.4561 -28.4521)

What’s different about the spatial dataframe compared to a normal one? Now let’s map it.

library(tmap)
tmap_mode('view')
qtm(df.sf, dots.col = 'Fire_severity', dots.palette = 'Reds', dots.size = 0.5) +
qtm(pivot_longer(df.sf, cols = spp1:spp5, names_to = 'spp', values_to = 'abundance'), dots.col = 'abundance', dots.size = 0.3) + tm_facets(by = 'spp', ncol = 2, free.scales = T)

Tip Use the layered diamonds in the top left of each map to turn the abundance layer off and see fire severity below.

An exploratory, algorithmic ordination

Before we try model-based ordination of our multivariate species abundance data, let’s try a quick exploratory nMDS to look for patterns.

library(vegan)
nmds <- metaMDS(df[,3:7], distance = 'bray', k = 2)
nmds

Call:
metaMDS(comm = df[, 3:7], distance = "bray", k = 2) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(df[, 3:7]) 
Distance: bray 

Dimensions: 2 
Stress:     0.1846236 
Stress type 1, weak ties
Best solution was repeated 3 times in 20 tries
The best solution was from try 0 (metric scaling or null solution)
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(df[, 3:7])' 

The stress is an estimate of the goodness of fit of the nMDS. It is less than 0.2, indicating a decent fit.

Let’s plot the ordination and see which survey locations are similar based on the abundance and composition species. First we need to extract the site scores and species scores for the two nMDS dimensions.

library(dplyr)
nmds_scores <- data.frame(df, scores(nmds)$sites) %>% mutate(Fire_class = factor(ifelse(Fire_severity > 0.5, 'Severe', 'Less severe')))
spp_scores <- data.frame(scores(nmds)$species)
spp_scores$spp <- rownames(spp_scores)

Now we can plot the ordination.

ggplot() +
  geom_point(data = nmds_scores, aes(x = NMDS1, y = NMDS2, col = Fire_class)) +
  stat_ellipse(data = nmds_scores, aes(x = NMDS1, y = NMDS2, col = Fire_class), alpha = 0.5) +
  geom_text(data = spp_scores, aes(x = NMDS1, y = NMDS2, label = spp)) +
  geom_vline(xintercept = 0, lty = 'dashed', alpha = 0.3) +
  geom_hline(yintercept = 0, lty = 'dashed', alpha = 0.3) +
  theme_classic()

So shows patterns we expected. We could do a permutational analysis of variance (PERMANOVA) to formally test whether species abundance and composition differs between fire classes.

adonis2(nmds_scores[,3:7] ~ Fire_class, data = nmds_scores)
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = nmds_scores[, 3:7] ~ Fire_class, data = nmds_scores)
           Df SumOfSqs      R2    F Pr(>F)    
Fire_class  1   0.2767 0.12578 14.1  0.001 ***
Residual   98   1.9231 0.87422                
Total      99   2.1998 1.00000                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

However, the inference we can make regarding sizes of effects for individuals species and the strength of the associations between species after accounting for fire severity is limited.

So let’s try model-based ordination with a multivariate GLM instead and see if we gain any extra inference.

Note that there are many other approaches to algorithmic ordination than we’ve covered here. nMDS is a type of ‘unconstrained’ ordination, but there are other approaches for ‘constrained’ ordination which are a closer analogue to the model-based ordination we’ll try next.

Find out more about different types of ordination and how to perform them in R here.

Model-based ordination

As before, we’ll set up our model and fit it with MCMC sampling. The main difference is this time we include a site-level random effect that models the residual correlations across species as latent (unobserved) variables.

# set up random effect
studyDesign <- data.frame(sample = as.factor(1:nrow(df)))
rL <- HmscRandomLevel(units = studyDesign$sample)

# set up the model
m <- Hmsc(Y = as.matrix(df[,3:7]),
          XData = df,
          XFormula = ~ Fire_severity,
          distr = 'poisson',
          studyDesign = studyDesign, 
          ranLevels = list(sample = rL))

# fit the model with MCMC sampling
#m_fit <- sampleMcmc(m, nChains = 4, samples = 1000, thin = 100, transient = 2500, verbose = F, nParallel = 4)
#saveRDS(m_fit, 'model_100thinning_mult.rds')
m_fit <- readRDS('model_100thinning_mult.rds')
mpost <- convertToCodaObject(m_fit)
plot(mpost$Beta)

effectiveSize(mpost$Beta)
  B[(Intercept) (C1), spp1 (S1)] B[Fire_severity (C2), spp1 (S1)] 
                        4013.418                         3410.421 
  B[(Intercept) (C1), spp2 (S2)] B[Fire_severity (C2), spp2 (S2)] 
                        4266.252                         4000.000 
  B[(Intercept) (C1), spp3 (S3)] B[Fire_severity (C2), spp3 (S3)] 
                        3708.327                         3960.258 
  B[(Intercept) (C1), spp4 (S4)] B[Fire_severity (C2), spp4 (S4)] 
                        3986.681                         4000.000 
  B[(Intercept) (C1), spp5 (S5)] B[Fire_severity (C2), spp5 (S5)] 
                        4336.367                         4117.720 
gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
                                 Point est. Upper C.I.
B[(Intercept) (C1), spp1 (S1)]    1.0009088   1.003705
B[Fire_severity (C2), spp1 (S1)]  1.0004670   1.002228
B[(Intercept) (C1), spp2 (S2)]    1.0008369   1.003331
B[Fire_severity (C2), spp2 (S2)]  1.0009290   1.003532
B[(Intercept) (C1), spp3 (S3)]    1.0006774   1.002546
B[Fire_severity (C2), spp3 (S3)]  1.0006305   1.002208
B[(Intercept) (C1), spp4 (S4)]    1.0006183   1.003114
B[Fire_severity (C2), spp4 (S4)]  0.9996928   1.000535
B[(Intercept) (C1), spp5 (S5)]    1.0007510   1.004198
B[Fire_severity (C2), spp5 (S5)]  1.0011504   1.004834

Okay, so convergence diagnostics look good. Let’s take a look at the structural model assumptions. We’ll use randomised quantile residuals, and this time we’ll use a handy custom made function to calculate and plot these for a multivariate model.

source('helper-functions.R')
preds <- computePredictedValues(m_fit) # array of fitted (predicted) values for species abundance for each of the 4000 MCMC samples
par(mfrow = c(5,2)) # set graphical parameters
rq_resid_mult(preds, m_fit)

Species 1 looks okay, but there are some strong patterns in the residuals vs. fitted plots for species 2-5. What’s going on here?

Let’s also check whether out coefficient estimates are biased from the known truths. Let’s remind ourselves what those were for each species.

print(data.frame(beta, spp = colnames(y)))
        b0          b1  spp
1 2.302585 -0.69314718 spp1
2 2.302585 -0.09531018 spp2
3 2.302585  0.00000000 spp3
4 2.302585  0.09531018 spp4
5 2.302585  0.69314718 spp5
summary(mpost$Beta)$statistics
                                        Mean         SD    Naive SE
B[(Intercept) (C1), spp1 (S1)]    2.31005058 0.07640597 0.001208084
B[Fire_severity (C2), spp1 (S1)] -0.77439142 0.13745494 0.002173353
B[(Intercept) (C1), spp2 (S2)]    2.42420407 0.08133197 0.001285971
B[Fire_severity (C2), spp2 (S2)] -0.07943399 0.13410166 0.002120333
B[(Intercept) (C1), spp3 (S3)]    2.04270034 0.09325765 0.001474533
B[Fire_severity (C2), spp3 (S3)]  0.09497635 0.15264734 0.002413566
B[(Intercept) (C1), spp4 (S4)]    2.20861887 0.07732993 0.001222693
B[Fire_severity (C2), spp4 (S4)]  0.01570744 0.12828137 0.002028307
B[(Intercept) (C1), spp5 (S5)]    2.29351026 0.06392410 0.001010729
B[Fire_severity (C2), spp5 (S5)]  0.69535776 0.09946209 0.001572634
                                 Time-series SE
B[(Intercept) (C1), spp1 (S1)]     0.0012057128
B[Fire_severity (C2), spp1 (S1)]   0.0023564464
B[(Intercept) (C1), spp2 (S2)]     0.0012713177
B[Fire_severity (C2), spp2 (S2)]   0.0021209584
B[(Intercept) (C1), spp3 (S3)]     0.0015491838
B[Fire_severity (C2), spp3 (S3)]   0.0024340068
B[(Intercept) (C1), spp4 (S4)]     0.0012253985
B[Fire_severity (C2), spp4 (S4)]   0.0020282625
B[(Intercept) (C1), spp5 (S5)]     0.0009744949
B[Fire_severity (C2), spp5 (S5)]   0.0015510346

So definitely some bias, although generally the sign of the effects are correct (i.e. negative vs. positive).

Let’s see if we can improve our model by including a spatial random effect to account for spatial autocorrelation.

Accounting for spatial autocorrelation

What is spatial autocorrelation? It’s Tobler’s first law of geography, ‘Everything is related to everything else, but near things are more related than distant things’.

This means that observations of species abundance that are closer together in space are more likely to be similar than observations made farther apart. This means our observations are not independent of each other - which is an assumption of general(ised) linear models.

We can include a spatial random effect that will estimate and adjust for spatial autocorrelation in our species abundance observations. Let’s try it.

# set up spatial random effect
studyDesign <- data.frame(sample = as.factor(1:nrow(df)))
rL.spatial <- HmscRandomLevel(sData = df[,c(1,2)])

# set up the model
m <- Hmsc(Y = as.matrix(df[,3:7]),
          XData = df,
          XFormula = ~ Fire_severity,
          distr = 'poisson',
          studyDesign = studyDesign, 
          ranLevels = list(sample = rL.spatial))

# fit the model with MCMC sampling
#m_fit <- sampleMcmc(m, nChains = 4, samples = 1000, thin = 100, transient = 2500, verbose = F, nParallel = 4)
#saveRDS(m_fit, 'model_100thinning_mult_spatial.rds')
m_fit <- readRDS('model_100thinning_mult_spatial.rds')
mpost <- convertToCodaObject(m_fit)
plot(mpost$Beta)

effectiveSize(mpost$Beta)
  B[(Intercept) (C1), spp1 (S1)] B[Fire_severity (C2), spp1 (S1)] 
                        3959.741                         3961.783 
  B[(Intercept) (C1), spp2 (S2)] B[Fire_severity (C2), spp2 (S2)] 
                        4131.849                         3910.877 
  B[(Intercept) (C1), spp3 (S3)] B[Fire_severity (C2), spp3 (S3)] 
                        3832.808                         4231.853 
  B[(Intercept) (C1), spp4 (S4)] B[Fire_severity (C2), spp4 (S4)] 
                        3905.527                         4000.000 
  B[(Intercept) (C1), spp5 (S5)] B[Fire_severity (C2), spp5 (S5)] 
                        3884.834                         3908.808 
gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
                                 Point est. Upper C.I.
B[(Intercept) (C1), spp1 (S1)]    1.0008076   1.002355
B[Fire_severity (C2), spp1 (S1)]  0.9998634   1.001546
B[(Intercept) (C1), spp2 (S2)]    1.0023186   1.008327
B[Fire_severity (C2), spp2 (S2)]  1.0003863   1.003001
B[(Intercept) (C1), spp3 (S3)]    1.0017570   1.005108
B[Fire_severity (C2), spp3 (S3)]  1.0009815   1.004367
B[(Intercept) (C1), spp4 (S4)]    1.0002037   1.001552
B[Fire_severity (C2), spp4 (S4)]  1.0000179   1.001101
B[(Intercept) (C1), spp5 (S5)]    1.0022484   1.005973
B[Fire_severity (C2), spp5 (S5)]  1.0017947   1.006986

Now let’s see if our residual diagnostic plots look better.

preds <- computePredictedValues(m_fit) # array of fitted (predicted) values for species abundance for each of the 4000 MCMC samples
par(mfrow = c(5,2)) # set graphical parameters
rq_resid_mult(preds, m_fit)

Much better!

We should also check for spatial autocorrelation in the residuals. We’ll do that visually by mapping the residuals.

resids.df <- rq_resid_mult(preds, m_fit, save.df = T)
qresid.sf <- cbind(select(df.sf, -c(spp1:spp5)), resids.df) %>% pivot_longer(cols = spp1:spp5, names_to = 'spp', values_to = 'residuals') %>% mutate(residuals_size = abs(residuals))
tmap_mode('plot')
qtm(qresid.sf, dots.col = 'residuals', dots.size = 'residuals_size') + tm_facets(by = 'spp', ncol = 3, free.scales = T)

There isn’t any strong spatial clustering of residuals, that’s good.

Now, let’s see if our coefficient estimates are closer to known truths.

print(data.frame(beta, spp = colnames(m_fit$Y)))
        b0          b1  spp
1 2.302585 -0.69314718 spp1
2 2.302585 -0.09531018 spp2
3 2.302585  0.00000000 spp3
4 2.302585  0.09531018 spp4
5 2.302585  0.69314718 spp5
summary(mpost$Beta)$statistics
                                        Mean         SD    Naive SE
B[(Intercept) (C1), spp1 (S1)]    2.27108688 0.10313308 0.001630677
B[Fire_severity (C2), spp1 (S1)] -0.73044569 0.13807296 0.002183125
B[(Intercept) (C1), spp2 (S2)]    2.30371045 0.17978449 0.002842642
B[Fire_severity (C2), spp2 (S2)]  0.04536074 0.12194183 0.001928070
B[(Intercept) (C1), spp3 (S3)]    2.16423548 0.18134522 0.002867320
B[Fire_severity (C2), spp3 (S3)] -0.02598279 0.13612260 0.002152287
B[(Intercept) (C1), spp4 (S4)]    2.27635666 0.12056401 0.001906284
B[Fire_severity (C2), spp4 (S4)] -0.05475742 0.12400776 0.001960735
B[(Intercept) (C1), spp5 (S5)]    2.27110783 0.07624168 0.001205487
B[Fire_severity (C2), spp5 (S5)]  0.71697044 0.10177380 0.001609185
                                 Time-series SE
B[(Intercept) (C1), spp1 (S1)]      0.001639748
B[Fire_severity (C2), spp1 (S1)]    0.002237641
B[(Intercept) (C1), spp2 (S2)]      0.002802497
B[Fire_severity (C2), spp2 (S2)]    0.001951068
B[(Intercept) (C1), spp3 (S3)]      0.002942687
B[Fire_severity (C2), spp3 (S3)]    0.002099697
B[(Intercept) (C1), spp4 (S4)]      0.001932249
B[Fire_severity (C2), spp4 (S4)]    0.001961076
B[(Intercept) (C1), spp5 (S5)]      0.001224401
B[Fire_severity (C2), spp5 (S5)]    0.001628120

Odd, now the coefficients seem even more biased from the known truth - in fact the sign of the b1 coefficients for spp 2 and spp 4 have switched. What’s going on? When we include a random effect in our generalised linear model, the slope coefficients are no longer interpreted as the mean expected change in species abundance (\(y\)) in response to a one unit shift in fire severity (\(X\)), but instead as the mean expected change in species abundance (\(y\)) in response to a one unit shift in fire severity (\(X\)) conditional on the spatial random effect. So, although the coefficients seem more biased from our known truths they are actually the known truths conditional on spatial autocorrelation.

We can check whether the model has recovered the scale of spatial autocorrelation that we simulated in the data. Let’s remind ourselves of what that was:

alpha.spatial
[1] 0.35

Now check the posterior distribution for the alpha parameter contains the known truth.

plot(mpost$Alpha[[1]][,1])

Looks great. We can also get a summary for alpha providing the mean estimate.

summary(mpost$Alpha[[1]][,1])

Iterations = 2600:102500
Thinning interval = 100 
Number of chains = 4 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE Time-series SE 
      0.317441       0.122894       0.001943       0.002000 

2. Quantiles for each variable:

  2.5%    25%    50%    75%  97.5% 
0.1106 0.2157 0.3098 0.4149 0.5367 

Very close to 0.35, our known truth.

Now let’s plot up the effect sizes with 50% and 95% credible intervals representing uncertainty in the parameter estimates. The credible intervals indicate whether there is weak (50% credible interval does not cross 0) or strong (95% credible interval does not cross 0) evidence for an effect.

# extract beta coefficients
source('helper-functions.R') # source a custom function to extract and summarise beta coefs for plotting
beta <- extract_beta(mpost$Beta, m_fit)
`summarise()` has grouped output by 'predictor', 'response'. You can override
using the `.groups` argument.
# plot effect sizes
ggplot(beta) +
  geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.5) +
  geom_linerange(aes(xmin = CI.25, xmax = CI.75, y = response), col = 'gray55', linewidth = 2) +
  geom_linerange(aes(xmin = CI.2.5, xmax = CI.97.5, y = response), col = 'gray75', linewidth = 0.5) +
  geom_point(aes(x = Mean, y = response), size = 0.5) +
  ylab('') +
  xlab('Effect size estimate') +
  theme_classic()

We can also estimate the amount of variability in the abundance of each species explained by the fixed vs. random effects.

# partition variance explained
VP <- computeVariancePartitioning(m_fit)
vardf <- data.frame(VP$vals)
vardf$cat <- c('Fire', 'Residual correlations')
vardf.long <- pivot_longer(vardf, spp1:spp5, names_to = 'Species', values_to = 'Variance_prop')

# plot
ggplot(vardf.long) +
  aes(y = Species, x = Variance_prop, fill = cat) +
  geom_bar(stat = 'identity') +
  theme_classic() +
  ylab('') +
  xlab('Proportion variance explained') +
  theme_classic() +
  theme(legend.text = element_text(size = 8),
        legend.title = element_blank(),
        plot.margin = margin(0, 0, 0, 0, "cm"),
        plot.title = element_text(size = 10),
        legend.key.size = unit(0.4, 'cm'))

And plot the residual correlations between species (i.e., the correlations between species left over after we account for variability in abundances due to fire severity). These residual correlations could be from species interacting with one another (e.g., predator-prey interactions), or they could be from some important explanatory variables that that cause species to co-vary, but we haven’t included in the model.

In this case we know that the residual correlations are the result of spatial autocorrelation in species’ abundances because we simulated them that way. In the real-world you wouldn’t know what is driving the residual correlations, but the patterns you see might allow you to make some hypotheses.

library(ggcorrplot)
OmegaCor <- computeAssociations(m_fit)
ggcorrplot(OmegaCor[[1]]$mean, tl.cex = 7)

So, after accounting for variability in species abundance due to fire severity, we infer that abundances of species 1 are positively correlated with abundances of species 2, and negatively correlated with species 3 and 4. This is exactly what we expected, given how we simulated spatial autocorrelation in the data.

Now we can extract the posterior means of the \(\eta\) and \(\lambda\) parameters, which represent the site loadings (\(\eta\)) and species loadings (\(\lambda\)) on the spatial latent variables (random effects) estimated by our model. We can use these latent variables to plot an ordination. Note this will look different from the nMDS ordination we made earlier, as what we’ve done here is a constrained ordination. The nMDS is an unconstrained ordination.

spp <- colnames(m_fit$Y)
LVs <- c('Latent.variable.1', 'Latent.variable.2', 'Latent.variable.3', 'Latent.variable.4')
site_scores <- data.frame(getPostEstimate(m_fit, "Eta")$mean)
colnames(site_scores) <- LVs
species_scores <- data.frame(t(getPostEstimate(m_fit, "Lambda")$mean))
colnames(species_scores) <- LVs
species_scores$spp <- spp
site_scores <- data.frame(site_scores, Fire_severity = df$Fire_severity) %>% mutate(Fire_class = factor(ifelse(Fire_severity > 0.5, 'Severe', 'Less severe')))

Now we can plot the ordination.

ggplot() +
  geom_point(data = site_scores, aes(x = Latent.variable.1, y = Latent.variable.2, col = Fire_class)) +
  stat_ellipse(data = site_scores, aes(x = Latent.variable.1, y = Latent.variable.2, col = Fire_class), alpha = 0.5) +
  geom_text(data = species_scores, aes(x = Latent.variable.1, y = Latent.variable.2, label = spp)) +
  geom_vline(xintercept = 0, lty = 'dashed', alpha = 0.3) +
  geom_hline(yintercept = 0, lty = 'dashed', alpha = 0.3) +
  theme_classic()

Now that we’ve removed variability in species abundances due to fire severity, we can see site species composition and abundance is longer are associated fire severity.

Predicting and mapping species abundance everywhere

Now that we are happy with our spatial, multivariate model of species abundance we might like to use it to make predictions of expected species abundance with fire severity everywhere in our survey area. In this sense you can think of our model as a joint species distribution model.

First we will generate a raster of expected fire density by interpolating between our observations of fire severity. As a reminder, we had simulated fire severity at our survey locations and it looks like this.

tmap_mode('view')
tmap mode set to interactive viewing
qtm(df.sf, dots.col = 'Fire_severity', dots.size = 0.1)

Let’s start by making a grid covering our survey area, which we can then use for interpolation. A spatial grid is what we often refer to as ‘raster’ data (the points above are referred to as ‘vector’ data). {terra} is good R package for working with spatial raster data.

library(terra)
extent <- st_bbox(df.sf)
grid <- rast(nrows = 20, ncols = 20, xmin = extent$xmin, xmax = extent$xmax, ymin = extent$ymin, ymax = extent$ymax, crs = "epsg:4326")
grid_fire <- rasterize(df.sf, grid, 'Fire_severity', fun = mean)
tmap_mode('plot')
tmap mode set to plotting
qtm(grid_fire) + qtm(df.sf)

Now do the interpolation using thin plate splines to predict fire severity everywhere.

library(fields)
xy <- data.frame(xyFromCell(grid_fire, 1:ncell(grid_fire))) # make a datafame of longitude and latitude coordinates for each grid cell
tps <- Tps(xy, values(grid_fire)) # fit a thin plate spline regression
Warning: 
Grid searches over lambda (nugget and sill variances) with  minima at the endpoints: 
  (GCV) Generalized Cross-Validation 
   minimum at  right endpoint  lambda  =  483.0456 (eff. df= 3.001006 )
grid_fire_i <- interpolate(rast(grid_fire), tps) # make a raster with interpolated values from the fitted tps regression
tmap_mode('view')
tmap mode set to interactive viewing
qtm(grid_fire_i)

Now we can make predictions of species abundance using the grid of interpolated fire severity. First, we need to set up our new data for making predictions by extracting fire severity values from our interpolated raster.

XData_new <- data.frame(Fire_severity = values(grid_fire_i))

Now we can prepare a gradient for making predictions that combines our new \(X\) data and values for a new spatial random effect.

gradient <- prepareGradient(m_fit, XDataNew = XData_new, sDataNew = list(sample = xy))

Finally, we’re ready to make predictions. We’ll generate an entire posterior predictive distribution (i.e., predictions for our sample of our posterior distribution of parameters), from which we’ll summarise as the both the mean predicted abundance value (preds_abund) and mean predicted probability of occurrence (preds_occur).

#preds_new <- predict(m_fit, Gradient = gradient, nParallel = 4)
#saveRDS(preds_new, 'model_100thinning_mult_spatial_prediction.rds')
preds_new <- readRDS('model_100thinning_mult_spatial_prediction.rds')
preds_abund <- apply(abind::abind(preds_new, along = 3), c(1,2), mean)
preds_occur <- apply(abind::abind(preds_new, along = 3), c(1,2), function(a) {mean(a > 0)})

Now we can map the predictions everywhere. Let’s just try the first species first.

grid_pred1_occur <- rast(grid, vals = preds_occur[,1])
grid_pred1_abund <- rast(grid, vals = preds_abund[,1])
qtm(grid_pred1_occur) + qtm(grid_pred1_abund)

Use the layered diamonds in the top left corner to turn the predicted abundance for species 1 off and see predicted occurrence instead. Now let’s try layering up and comparing predicted abundance and distributions for all species. We’ll use a for loop to create a raster stack of predicted abundance for each species.

stack <- list() # empty list for storing grids for each species' predictions
for(i in seq_along(1:ncol(preds_abund))){
stack[[i]] <- rast(grid, vals = preds_abund[,i])
}
stack <- do.call(c, stack) # turn list into a raster stack
names(stack) <- colnames(preds_abund)
stack
class       : SpatRaster 
dimensions  : 20, 20, 5  (nrow, ncol, nlyr)
resolution  : 0.01950348, 0.01951679  (x, y)
extent      : 152.1042, 152.4943, -28.49286, -28.10252  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
names       :    spp1,     spp2,   spp3,    spp4,     spp5 
min values  : 6.18325,  6.97425,  5.628,  7.4845, 12.22075 
max values  : 7.83625, 15.75875, 13.034, 12.0660, 15.44650 

Excellent. Now let’s map it.

qtm(stack) + tm_facets(as.layers = T, free.scales = T)

There we have it, we’ve mapped out predictions of abundance from our joint species distribution model! (Again, make sure you use the layered diamonds in the top left to turn layers on and off.)

Saving static maps

Here’s a quick example of how we can turn our interactive maps into something that is publication ready by having more control over the aesthetics than is possible with the qtm function we’ve been using previously.

tmap_mode('plot')
tmap mode set to plotting
map <- tm_shape(stack) + # Specify the shapefile or raster to map
  tm_raster(title = 'Species abundance') + # What is being displayed? Here, it's a raster
  tm_shape(df.sf) + # Add another shapefile to our map
  tm_dots( # Now we are displaying points (dots)
    title = 'Fire severity', # Title as it appears in the legend
    col = 'Fire_severity', # Colour the points by variable
    pal = 'BuGn', # Define the colour palette
    size = 0.05, 
    scale = 2) +
  tm_facets(as.layers = T) + 
  tm_layout(panel.labels = c('Sp.1', 'Sp.2', 'Sp.3', 'Sp.4', 'Sp.5'), # Define labels
            legend.position = c(-0.5, -0.05), # Customise legend
            legend.title.fontface = 'bold') +
  tm_compass(position = c(0.001, 0.82)) + # Add a north arrow
  tm_scale_bar(position = c(0.65, 0.015), # Add a scale bar
               breaks = c(0,5,10),
               text.size = 0.4)
map

To save it:

#tmap_save(map, 'map.png')

Resources

Hopefully this short course gave you a sense of how flexible Bayesian statistical approaches can be. For example, we used the same Bayesian modelling framework to:

  1. fit a linear model,
  2. fit a generalised linear model,
  3. fit a multivariate generalised linear model (which is also model-based ordination), and
  4. fit a spatial, multivariate generalised linear model (which is also a joint species distribution model).

Phew, that is a lot!

Find out more about fitting Bayesian models and using R as a GIS here: